This is a more advanced application of DSC with R scripts. We demonstrate in this tutorial features of DSC2 including:
The DSC problem is based on the ASH example of DSCR (R Markdown version and HTML version). Material to run this tutorial can be found in DSC2 vignettes repo. Description below is copied from the DSCR vignette:
To illustrate we consider the problem of shrinkage, which is tackled by the
ashrpackage at http://www.github.com/stephens999/ashr. The input to this DSC is a set of estimates $\hat\beta$, with associated standard errors $s$. These values are estimates of actual (true) values for $\beta$, so the meta-data in this case are the true values of beta. Methods must take $\hat\beta$ and $s$ as input, and provide as output "shrunk" estimates for $\beta$ (so output is a list with one element, calledbeta_est, which is a vector of estimates for beta). The score function then scores methods on their RMSE comparingbeta_estwith beta.First define a datamaker which simulates true values of $\beta$ from a user-specified normal mixture, where one of the components is a point mass at 0 of mass $\pi_0$, which is a user-specified parameter. It then simulates $\hat\beta \sim N(\beta_j,s_j)$ (where $s_j$ is again user-specified). It returns the true $\beta$ values and true $\pi_0$ value as meta-data, and the estimates $\hat\beta$ and $s$ as input-data.
Now define a method wrapper for the
ashfunction from theashrpackage. Notice that this wrapper does not return output in the required format - it simply returns the entire ash output.Finally add a generic (can be used to deal with both $\pi$ and $\beta$) score function to evaluate estimates by
ash.
The problem is fully specified in DSC2 language below, following the structure of the original DSCR implementation:
simulate:
exec: datamaker.R
seed: R(1:5)
params:
g: Asis(ashr::normalmix(c(2/3,1/3),c(0,0),c(1,2))),
Asis(ashr::normalmix(rep(1/7,7),
c(-1.5,-1,-0.5,0,0.5,1,1.5),rep(0.5,7))),
Asis(ashr::normalmix(c(1/4,1/4,1/3,1/6),
c(-2,-1,0,1),c(2,1.5,1,1)))
min_pi0: 0
max_pi0: 1
nsamp: 1000
betahatsd: 1
.alias: args = List()
return: data, true_beta = R(data$meta$beta),
true_pi0 = R(data$meta$pi0)
shrink:
exec: runash.R
params:
input: $data
mixcompdist: normal, halfuniform
return: ash_data, beta_est = R(ashr::get_pm(ash_data)),
pi0_est = R(ashr::get_pi0(ash_data))
beta_score:
exec: score.R
.alias: score_beta
params:
beta_true: $true_beta
beta_est: $beta_est
.alias: est = beta_est, truth = beta_true
return: result
pi0_score:
exec: score.R
.alias: score_pi0
params:
pi0_est: $pi0_est
pi0: $true_pi0
.alias: est = pi0_est, truth = pi0
return: result
DSC:
run: simulate *
shrink *
(beta_score, pi0_score)
R_libs: stephens999/ashr (2.0.0+)
exec_path: bin
output: dsc_result
This is a more complicated example. It is suggested that you walk through every DSC block, cross-referencing the corresponding R code for datamaker, method wrapper and score function to figure out how DSC2 communicates with your R program.
Next we will walk though each DSC blocks and illustrate some highlights.
simulate¶The parameter g has three candidate values, all of which are R codes within Asis() function. Contents inside Asis() will be interpreted as functional code pieces rather than strings. In other words, DSC2 will interpret it as g <- ashr::normalmix(c(2/3,1/3),c(0,0),c(1,2)) so that g will be assigned output of R codes in Asis() for use with datamaker.R. Without Asis, this line will be interpreted as a string assigned to g which will cause troubles.
Inside datamaker.R the input for the core function is a single parameter of an R list containing all parameters specified in this block. The .alias entry uses a special DSC2 operation List() to consolidate these parameters into an R list args which corresponds to the input parameter in datamaker.R.
The return object is data which is consistent with codes in datamaker.R. However we want to extract some variables from data for use with other steps. This is achieved by return alias which creates variables based on existing values. R() operator is used here to extract information from existing objects using R syntax.
shrink¶Here notice the return alias pi0_est which uses the get_pi0 function in R() operator to extract information from existing output ash_data.
beta_score & pi0_score¶These two blocks uses the same computational routine score.R but on different input data. Adjustment have to be made via .alias to distinguish these blocks for DSC output and to match input variable names for score.R. Executable .alias renames the computational routines from generic score.R to score_beta and score_pi0 respectively. These routine names will become part of column names in the DSC output database and they should be made distinct. Parameter .alias converts input variables names to variable names matching what has been coded in score.R. It is possible to use these names directly, e.g.,
beta_score:
...
params:
truth: $true_beta
est: $beta_est
...
The DSC will also work. Here for clarity and for demonstration of parameter alias we use informative names as parameter names in DSC, and convert these names to what is required by the R codes via .alias.
Notice too that different from the DSCR ASH example the output score is a simple value (a float of RMSE). If the outcome of the R code is not a simple object, for example it returns a list variable score_output, then you may want to use return alias to extract important information to simple values so that they'll be readily extractable down the line, e.g., return: score_output, mse = R(score_output$mse).
DSC section¶The DSC::run executes two sequences which we have discussed in previous tutorials. The R_libs entry specifies the R package required by the DSC. It is formatted as a github package (repo/pkg) and the minimal version requirement is 2.0.0. DSC will check first if the package is available, and install it if necessary. It will then check its version and quit on error if it does not satisfy the requirement. DSC does not attempt to change a package for version mismatch.
This diagram (generated by dot command using the execution graph from this DSC) shows the logic of this benchmark:

! dsc -x settings.dsc -j 8 --seeds "R(1:50)" -f
The DSCR ASH example adds names to various simulation settings and methods. Here we use DSC annotation file to reproduce the DSCR example. We create a settings.ann file as follows:
An:
simulate:
g: Asis(ashr::normalmix(c(2/3,1/3),c(0,0),c(1,2)))
Bn:
simulate:
g: Asis(ashr::normalmix(rep(1/7,7),c(-1.5,-1,-0.5,0,0.5,1,1.5),rep(0.5,7)))
Cn:
simulate:
g: Asis(ashr::normalmix(c(1/4,1/4,1/3,1/6),c(-2,-1,0,1),c(2,1.5,1,1)))
ash_n:
shrink:
mixcompdist: normal
ash_nu:
shrink:
mixcompdist: halfuniform
and we apply this annotation to our benchmark:
! dsc -a settings.ann
! dsc -e pi0_score:result --target pi0_score -o ashr_pi0_1.rds \
--tags "case1 = An && ash_n" "case2 = An && ash_nu"
We can examine the result in R, similar to what we have done in the Quick Start example:
%use ir
dat = readRDS('ashr_pi0_1.rds')
case1 = unlist(dat$case1_pi0_score_result)
case2 = unlist(dat$case2_pi0_score_result)
print(c(mean(case1), mean(case2)))
suppressMessages(library(plotly))
p = plot_ly(y = case1, name = 'case 1', type = "box") %>%
add_trace(y = case2, name = 'case 2', type = "box") %>%
layout(title = "MSE for pi_0 estimate")
htmlwidgets::saveWidget(as.widget(p), "pi0_score.html")
IRdisplay::display_html(paste(readLines("pi0_score.html"), collapse="\n"))
You can also extract quantities of interest in any steps in a DSC sequence. For example we want to compare MSE for posterior mean estimate, and at the same time we want to explore the distribution of posterior mean. We first extract both quantities:
! dsc -e beta_score:result shrink:beta_est --target beta_score -o ashr_beta_1.rds \
--tags "case1 = An && ash_n" "case2 = An && ash_nu"
Then we plot them both:
%use ir
dat = readRDS('ashr_beta_1.rds')
case1 = unlist(dat$case1_beta_score_result)
case2 = unlist(dat$case2_beta_score_result)
case1_beta = rowMeans(data.frame(dat$case1_shrink_beta_est))
case2_beta = rowMeans(data.frame(dat$case2_shrink_beta_est))
#
suppressMessages(library(plotly))
p = plot_ly(y = case1, name = 'case 1', type = "box") %>%
add_trace(y = case2, name = 'case 2', type = "box") %>%
layout(title = "MSE for beta estimate")
htmlwidgets::saveWidget(as.widget(p), "beta_score.html")
IRdisplay::display_html(paste(readLines("beta_score.html"), collapse="\n"))
p = plot_ly(x = case1_beta, name = 'case 1', opacity = 0.9, type = "histogram") %>%
add_trace(x = case2_beta, name = 'case 2', opacity = 0.9, type = "histogram") %>%
layout(title = "Posterior mean distribution")
htmlwidgets::saveWidget(as.widget(p), "beta.html")
IRdisplay::display_html(paste(readLines("beta.html"), collapse="\n"))
You can also benchmark the time it takes to run a computational step. For example:
case1 = unlist(dat$DSC_TIMER$case1_shrink_beta_est)
case2 = unlist(dat$DSC_TIMER$case2_shrink_beta_est)
#
suppressMessages(library(plotly))
p = plot_ly(y = case1, name = 'case 1', type = "box") %>%
add_trace(y = case2, name = 'case 2', type = "box") %>%
layout(title = "Timer elapsed for posterior mean estimation")
htmlwidgets::saveWidget(as.widget(p), "beta_time.html")
IRdisplay::display_html(paste(readLines("beta_time.html"), collapse="\n"))